Multiplexing device and demultiplexing device

ABSTRACT

The multiplexer ( 100 ) includes a first input unit ( 101 ) that obtains video data, a second input unit ( 104 ) that obtains audio data, a first analysis unit ( 103 ) that analyzes video data and obtains video sample header information, a second analysis unit ( 106 ) that analyzes audio data and obtains audio sample header information, a packetization part determination unit ( 107 ) that determines the packetization part of the audio data in a way that the packetization part is made to be the same or approximately the same as the playback start time of the video sample that is placed in the leading part of the packetization part of the video data after determining the packetization part of the video data based on the video sample header information, a packet header part generation unit ( 112 ) that generates the packet header part on the basis of the determined packetization part, a packet data generation unit ( 113 ) that generates the packet data unit on the basis of the determined packetization part and a packet connection unit ( 114 ) that generates a packet by connecting the generated packet header part to the packet data part.

TECHNICAL FIELD

The present invention relates to a multiplexer that multiplexes mediadata such as video data, audio data and the like and a demultiplexerthat reads and demultiplexes a bit string where media data such as videodata, audio data and the like are multiplexed.

BACKGROUND ART

The recent increase in capacity of a communication network and thedevelopment of a transmission technique has remarkably popularized theonline video distribution service of distributing a video file of amultimedia content including a video, audio, a text, a still picture andthe like to a personal computer. Also, the third generation partnershipproject (3GPP) that is an international standardization group which hasan object to standardize the standards of the so-called third-generationmobile communication systems such as mobile terminals are seen making amovement of defining the transparent end-to-end packet switchedstreaming service (TS26.234) as a standard related to a wireless videodistribution, and the video distribution service is expected to befurther provided to mobile communication terminals such as mobile phonesand PDAs.

When distributing a video file in the video distribution service, amultiplexer reads media data such as a video, a still picture, audio, atext and the like and multiplexes header information necessary forplaying back the media data and the entity data of the media data so asto generate a video file data. As a multiplex file format of this videofile data, an MP4 file format is focused on.

This MP4 file format is the multiplex file format which is understandardization by the international standardizationorganization/international engineering consortium (ISO) JTC1/SC29/WG11that is the international standardization group and expected to becomewidely spread because it is also employed by the TS26.234 of theabove-mentioned 3GPP.

Here, the data structure of the MP4 file will be explained.

The MP4 file stores the header information and the entity data of mediadata on a basis of an object called box and is made up of plural boxesthat are arranged hierarchically.

FIG. 1 is a diagram for explaining the structure of a box included in aconventional MP4 file.

The box 901 is made of a box header part 902 where the headerinformation of the box 901 is stored and a box data storage part 903where data included in the box 901 (such as a sub-box of the box and afield for describing the information) is stored.

This box header part 902 has fields of a box size 904, a box type 905, aversion 906 and a flag 907.

The box size 904 is the field describing the size information of thewhole box 901, including the byte size assigned for this field.

The box type 905 is the field describing the identifier for identifyingthe type of the box 901. This identifier is generally presented by fouralphabet strings. Note that there are cases where each box is shown byusing this identifier in this specification.

The version 906 is the field where a version number showing the versionof the box 901 is described, and the flag 907 is the field describingflag information that is set for each box 901. This version 906 and theflag 907 are not always necessary for all boxes 901, and a box 901 thatdoes not have these fields may exist.

The MP4 file made of a series of boxes 901 that has this structure canbe broadly divided into a basic part that is essential in the filestructure and an extension part that is used as a need arises. First,the basic part of the MP4 file will be explained.

FIG.2 is a diagram for explaining the basic part of a conventional MP4file.

The basic part 911 of the MP4 file 910 is made of a file header part 912and a file data part 913.

The file header part 912 is the part where header information of thewhole file such as the information on a video data compression codingmethod and the like of video data is stored and is made of a file typebox 914 and a movie box 915.

The file type box 914 is a box identified by the identifier “ftyp” andstores the information for identifying the MP4 file. As thestandardization group or a service provider can arbitrary prescribewhich media data is stored in the MP4 file and which compression codingmethod is used for the video data, the audio data and the like that isstored in the MP4 file, the information for identifying the prescriptionaccording to which the MP4 file is generated is stored in this file typebox 914.

The movie box 915 is the box identified as the identifier “moov” andstores header information of the entity data stored in the file datapart 913 such as a display duration.

The file data part 913 is made of a movie data box 916 identified as theidentifier “mdat”. Note that it is also possible to refer to an externalfile that is different from this MP4 file 910 instead of this file datapart 913. In this way, in the case of referring to the external file,the basic part 911 of the MP4 file 910 is made essentially of the fileheader part 912. In this specification, the case where entity data isincluded in the MP4 file 910 will be explained, not the case where thisexternal file is referred to.

The movie data box 916 is a box for storing the entity data of the mediadata on a basis of a unit called sample. This sample is a smallestaccess unit in the MP4 file and corresponds to a video object plane(VOP) of the video data coded in a compression coding method of themoving picture experts group 4 visual (MPEG) or a frame of the audiodata.

Here, the lower hierarchy in the structure of the movie box 915 in thebasic part of a conventional MP4 file will be explained.

FIG. 3 is a diagram for explaining the structure of the movie box in theconventional MP4 file.

As shown in FIG. 3A, the movie box 915 is made of the box header part902 and the box data storage part 903 that have already been explained.And, the size information of the movie box 915 is described (“xxxx” inFIG. 3A) in the field of the box size 904 that constitutes the boxheader part 902, and the identifier “moov” of the movie box 915 isdescribed in the field of the box type 905.

Also, the movie header box 917 where the header information of the basicpart 911 of the MP4 file 910 is stored or the track box 918 where theheader information for each track such as the video track and the audiotrack is stored in the box data storage part 903 of the movie box 915.Note that a track here means the whole sample data of each mediaincluded in the MP4 file 910, and the track of a video, audio, a text orthe like is called as a video track, an audio track, a text track or thelike respectively. Also, in the case where a plurality of data of thesame media are included in the MP4 file 910, a plurality of tracks existin the same media. Specifically explaining, in an example case where twotypes of video data are included in the MP4 file 910, two video tracksexist.

The movie header box 917 is made of the box header part 902 and the boxdata storage part 903 that have already been explained, the sizeinformation of the movie header box 917 is described (“xxx” in FIG. 3A)in the field of the box size 904 that constitutes the box header part902, and the identifier “mvhd” of the movie header box 917 is describedin the field of the box type 905. And, information on the durationneeded for playing back the content included in the basic part 911 ofthe MP4 file 910 and the like is stored in the box data storage part 903of the movie header box 917.

Also, the size information of the track box 918 (“xx” in FIG. 3A) isdescribed in the field of the box size 904 that constitutes the boxheader part 902 of the track box 918, the identifier “track” of thetrack box 918 is described in the field of the box type 905. And, thetrack header box 919 is stored in the box data storage part 903 of thetrack box 918.

The track header box 919 is the box that has a field for describing theheader information for each track and is identified by the identifier“tkhd”. The field for describing a track ID for identifying the tracktype or the information on the duration needed for playing back thetrack is described in the box data storage part 903 of this track headerbox 919.

In this way, boxes 901 are arranged hierarchically in the movie box 915,and header information for each track for a video, audio or the like isstored in the track box 918 that can be identified by “trak”. And,header information on a basis of a track sample is stored in the lowerboxes included in this track box 918.

When showing the structure of the movie box 915 shown in FIG. 3A as atree, a diagram like FIG. 3B can be obtained.

In other words, it is shown that a movie header box 917 and a track box918 are arranged as a group of lower boxes of the movie box 915, a trackheader box 919 is arranged as a group of lower box of the track box 918,and boxes 901 are arranged hierarchically.

At the initial stage of standardizing the MP4 file format, the MP4 file910 is made essentially of the above-mentioned basic part 911. However,the increase in the information amount of media data entails theincrease in the file size, which produces various problems such as thedifficulty in the application for streaming playback, and thus animprovement of additionally using an extension part where a plurality ofcombinations of a header box and a data box are serially arranged.

FIG. 4 is a diagram showing the structure of a conventional MP4 fileincluding an extension part.

As shown in FIG. 4, the MP4 file 920 to which the above-mentionedimprovement is added is made of a basic part 911 and an extension part921. The MP4 file 920 including this extension part 921 can store all ofthe media data in the extension part 921, it is possible to omit themovie data box 916 of the MP4 file basic part 911.

The extension 921 is made of a plurality of packets 922 that is dividedon a basis of predetermined part.

This packet 922 is made of a pair of a movie fragment box 923 and amovie data box 916, and also called as movie fragment.

The movie data box 916 stores a sample for each track on a basis of theabove-mentioned predetermined part. The movie fragment box 923 is thebox for storing the header information corresponding to this movie databox 916 and identified by the identifier “moof”. The structure of thismovie fragment box 923 will be explained more specifically.

FIG. 5 is a diagram for explaining the structure of a conventional moviefragment box.

As shown in FIG. 5, a movie fragment header box 924 and a plurality oftrack fragment boxes 925 are stored in the box data storage unit 903 ofthe movie fragment box 923.

The movie fragment header box 924 is the box identified by theidentifier “mfhd” and stores the header information of the whole moviefragment box 923.

The track fragment box 925 is the box identified by the identifier“traf” and stores the header information for each track.

Note that a single track fragment box 925 is generally prepared for theheader information of a single track, but it is also possible to preparea plurality of track fragment boxes 925 for a single track, headerinformation. In this way, when a single track header information isdivided into a plurality of track fragment boxes 925 so as to be stored,decoding time of the leading sample of the track fragment box 925 isarranged in an ascending order.

After that, a track fragment header box 926 and one or more trackfragment run box 927 are stored in the box data storage part 903 of thistrack fragment box 925.

The track fragment header box 926 is the box identified by theidentifier “tfhd” and stores a field for describing the track ID foridentifying the type of a track or information on the default value suchas the playback time of a sample and the like.

The track fragment run box 927 is the box identified by the identifier“trun” and stores the header information on a basis of a sample. Thistrack fragment run box 927 will be explained with reference to FIG. 6.

FIG. 6 is a diagram for explaining the structure of a conventional trackfragment run box 927.

The flag 907 is the field describing flag information set for each box901, here the flag information showing whether each field from the dataoffset 929 to the sample composition time offset 936 is included in thetrack fragment run box 927 next to the flag 907.

The sample count 928 is the field describing the information showing thenumber of header information items concerning the sample is stored inthe track fragment run box 927.

The data offset 929 is the field describing the pointer informationshowing in which part of the movie data box 916 paring with the entitydata of the sample placed at the leading part of the track fragment runbox 927 among the samples whose header information items are stored inthe track fragment run box 927.

The leading sample flag 930 is the field where the value of the filed ofthe later-explained sample flag 935 is overwritten in the case where theleading sample of the track fragment run box 927 is arandomly-accessible sample. Here, the random access means the processingoperation of moving the playback location of data in the middle of theplayback to the location 10 minutes later or starting the playback fromthe point in the middle of the data in a playback apparatus of the MP4file. In addition, the randomly-accessible sample is the sample, amongvideo samples, that constitutes a frame that can be solely decodedwithout referring to other frame data, that is an intra coded frame(so-called an intra frame) in the playback apparatus of the MP4 file.Note that all the audio samples are the samples that are randomlyaccessible because all of the audio samples can be solely decoded.

The table 931 is the one where the same number of entries 932 showingthe header information items for respective samples as the number ofentries shown in the sample count 928 is integrated.

The entry 932 is a collection of fields showing header information itemsfor respective samples, and the included field is indicated by theabove-mentioned flag 907. Fields included in the entry 932 includes asample duration 933 describing a sample playback duration, a sample size934 describing a sample size, a sample flag 935 describing the flaginformation indicating whether the sample is randomly accessible or not,and a sample composition time offset 936 describing the differentialvalue between the sample decoding time and the sample display time inorder to handle samples using an interactive prediction.

Note that, these fields are not included in the entry 932, as defaultvalues of these fields are described in the track fragment header box926 or the movie extend box (identifier “mvex”) in the movie fragmentbox 915, these default value of the fields are used for each of thesample header information items.

Also, the header information items of samples are described in the trackfragment run box 927 in the order of decoding time. Therefore, at thetime when the apparatus that plays back the MP4 file searches the sampleheader information items, referring to track IDs in the track fragmentheader box 926 starting from the track fragment box 925 that is theleading box in the file means searching the track fragment box 925including the header information item of the track to be obtained andsearching the header information of a sample starting from the trackfragment run box 927 that is the leading box in the track fragment box925.

Note that, in the case of the MP4 file 920 including this extension part921, the information necessary for the whole track such as the initialinformation at the time of decoding is stored in the movie box 915.

Next, the structure example of the MP4 file including the extension 921having the structure like this will be explained.

FIG. 7 is a diagram showing the structure example of the extension partof the MP4 file including the conventional extension part.

In FIG. 7, the storage method of a content will be explained showing twoexamples, and the content playback duration is 60 seconds.

The MP4 file 940 shown as FIG. 7A has the structure of storing mediadata in both the basic part 941 and the extension part 942. In otherwords, a part of the media data from 0 to 30 seconds is stored in themdat_1 (code 945) of the basic part 941, a part of the media data from30 to 45 seconds is stored in the mdat_2 (code 947) of the extensionpart 942, and a part of the media data from 45 to 60 seconds is storedin the mdat_3 (code 949). In addition, the header information of mdat_1(code 945) is stored in moov 944, the header information of mdat_2 (code947) is stored in the moof_1 (code 946) and the header information ofmdat_3 (code 949) is stored in the moof_2 (code 948).

In contrast, the MP4 file 950 shown in FIG. 7B has the structure ofstoring the media data in the extension part 952 only. In other words,the basic part 951 is made of ftyp 953 and moov 954 and does not includeany mdat, a part of media data from 0 to 30 seconds is stored in mdat_1(code 956) in the extension part 952, and a part of the media data from30 to 60 seconds is stored in mdat_2 (code 958). In addition, the headerinformation of mdat_1 (code 956) is stored in moof_1 (code 955), and theheader information of mdat_2 (code 958) is stored in moof_2 (code 957).

Here, how the extension part of the above-mentioned MP4 file isgenerated will be explained with reference to FIG. 8 to FIG. 10.

FIG. 8 is a block diagram showing the structure of the conventionalmultiplexer.

The multiplexer 960 is an apparatus that multiplexes the media data andgenerates the extension part data of the MP4 file. Here, the extensionpart data of the MP4 file is generated by multiplexing video data andaudio data.

The first input unit 961 captures video data in the multiplexer 960 andhas the first data storage unit 962 store the video data. Also, thesecond input unit 964 captures audio data in the multiplexer 960 and hasthe second data storage unit 965 to store the audio data.

The first analysis unit 963 reads out samples of video data items one byone from the first data storage unit 962 so as to analyze them andoutputs the header information items of the video samples to thepacketization part determination unit 967. Also, the second analysisunit 966 reads out samples of audio data one by one from the second datastorage unit 965 so as to analyze them and outputs the headerinformation items of the audio samples to the packetization partdetermination unit 967. This header information items of video samplesand the header information items of audio samples include theinformation indicating the size or the playback durations of thesamples, and the header information items of video samples include theinformation items showing whether the video samples are intra frames ornot.

The packetization part determination unit 967 determines thepacketization part of the video data and the audio data so that thenumber of samples included in the packet become constant and generatesthe header information items of the respective packets based on theobtained sample header information items.

FIG. 9 shows the processing operation flow of the conventionalpacketization part determination unit. Here, the number of samplesstored in a packet is N, and the predetermined number of N is stored ina memory or the like of the multiplexer 960.

First, when the first analysis unit 963 obtains a video sample (S901)and outputs the video sample header information to the packetizationpart determination unit 967, the packetization part determination unit967 adds a video sample header information to a packet generation table(S902).

Next, the packetization part determination unit 967 updates the numberof video samples included in the packet (S903) and judges whether thenumber of the video samples included in the packet becomes N or not(S904).

Here, the above-mentioned processing from S901 to S903 is repeated inthe case where the number of video samples included in the packet doesnot reach N (No in S904), and the packetization part determination unit967 packetizes N video samples to finish the processing operation(S905).

Likewise, the packetization part determination unit 967 packetizes theaudio samples by performing the processing operation of theabove-mentioned S901 to S905.

After that, the packetization part determination unit 967 repeats theprocessing operation of this flow until all the samples have beenpacketized.

FIG. 10 shows an example of the packet generation table that stores theheader information items of the conventional video samples. This packetgeneration table 968 a describes, for each of the video samples, thesizes of samples, the sample playback durations, or the informationrelated to the intra coded frame flags showing whether the video samplesare intra frames or not. Here, the leading video sample stored in thepacket shows that the size is 300 bytes, the playback duration is 30 ms,and that it is not the intra coded frame. And, the second video sampleshows that it is the intra coded frame. In addition, this packetgeneration table 968 a is outputted to the packet generation tablestorage unit 968 at the time when these information items are added insequence in the packetization part determination unit 967 until “N”thsample that is the sample included in a packet is generated.

Referring to FIG. 8 again, next, the packetization part determinationunit 967 describes the header information items of N samples in thepacket generation table 968 a, and then it outputs the packet generationtable 968 a to the packet generation table storage unit 968 and a packetgeneration signal to the packet header generation unit 969.

The packet header generation unit 969, when obtaining the packetgeneration signal, reads out the packet sample header information fromthe packet generation table 968 a that is held in the packet generationtable storage unit 968 and generates moof data. Also, the packet headergeneration unit 969 outputs the generated moof data to the packetconnection unit 971 and outputs, to the packet data generation unit 970,the mdat information including (i) pointer information indicating whichparts of the first data storage unit 962 and the second data storageunit 965 store the entity data items of samples included in the packetand (ii) the size information items of samples.

The packet data generation unit 970 reads out the entity data items ofsamples from the first data storage unit 962 and the second data storageunit 965 based on the obtained mdat information so as to generate mdatdata and outputs the mdat data to the packet connection unit 971.

After that, the packet connection unit 971 connects the moof data withthe mdat data so as to output the data in the mp4 extension part for asingle packet.

Finally, the outputted mp4 extension data for a single packet iscaptured into an apparatus that generates the MP4 file and the data ofthe mp4 extension part that is generated in sequence are arranged insequence so that the extension part of the MP4 file is generated. Afterthat, this file generation apparatus connects the basic part with theextension part of the MP4 file so as to generate an MP4 file.

However, at the time when the extension part of the MP4 file that ismultiplexed by the conventional multiplexer like this is played back,there are problems listed below.

As a conventional demultiplexer multiplexes data without considering theplayback start time of samples included in the packet, there is a casewhere an audio sample that is synchronized with the video sample whichhas certain playback time is stored in a packet that is different fromthe packet in the case of video samples. Therefore, this is the cause ofa problem that the efficiency of the data access in playing back an MP4file by the playback apparatus deteriorates.

Also, as a conventional multiplexer multiplexes data based on the numberof samples included in a packet, randomly-accessible samples, that is,video samples corresponding to intra frames are respectively stored in adifferent part of the packet packet by packet in most cases. Therefore,there is a problem that the calculation amount needed for searchingsamples becomes huge because the MP4 file playback apparatus must searchall the video samples included in a packet when searchingrandomly-accessible samples.

These problems will be explained in detail with reference to FIG. 11.

FIG. 11 is a diagram for explaining problems of a conventionalmultiplexer.

FIG. 11A illuminates the first problem that the efficiency of the dataaccess deteriorates during the playback.

The header information items of samples included in respective mdat arestored in each moof immediately before each mdat, the header informationitem concerning the video sample of playback start time 20s stored inmdat_1 is stored in moof_1 as the leading sample and the headerinformation item concerning the audio sample of the playback time 20sstored in mdat_10 is stored in moof_10 as the last sample.

Therefore, the MP4 file playback apparatus must search data up tomoof_10 during the time period of obtaining the header information itemsof video samples stored in moof_1 to obtaining the header informationitems of audio samples when trying to play back the part of 20 secondsin the playback time of a content, which makes the efficiency of thedata access deteriorate.

FIG. 11B illuminates the second problem that the calculation amountneeded for searching randomly-accessible samples becomes huge.

The header information item concerning the “i”th randomly-accessiblevideo sample stored in the last part of the mdat_1 is stored as the lastsample in moof_1, and the header information concerning the “i+1”thrandomly-accessible video sample that is stored in the last part of themdat_3 is stored as the last sample in moof_3.

Therefore, the MP4 file playback apparatus must search up to the lastsample of moof when trying to perform random access, and thus thecalculation amount necessary for searching becomes huge.

Further, in addition to the first and the second problems, as the numberof seeks for obtaining the sample data becomes many under the structureof the extension part of the MP4 file that is generated in theconventional multiplexer, there is another problem that this is notappropriate for the random access playback in an apparatus which has aslow seek speed such as an optical disc playback apparatus.

This problem will be explained with reference to FIG. 11B again. In thecase of trying to perform random access to the “i”th randomly-accessiblevideo sample of moof_1, the playback apparatus moves a reading pointerto the leading point of moof_1 in order to obtain the header informationitem of the “i”th randomly-accessible video sample first and thenanalyzes data in moof_1 in sequence. At this time the first seek becomesnecessary.

After that, the playback apparatus obtains the information as to whichpart of mdat_1 stores the entity data of the “i”th randomly-accessiblevideo sample and moves the reading pointer to the starting position ofthe entity data. At that time, as the entity data of the “i”thrandomly-accessible video sample is stored in the end of mdat_1, it isimpossible to obtain the entity data of a sample by moving the readingpointer in sequence from the leading position of moof_1, and thus thesecond seek becomes necessary.

In other words, as respective seek operations are performed at the timeof moving the reading pointer to the leading location of moof_1 and tothe starting position of the entity data, it takes a lot of time toperform random access playback in the case where the playback apparatushas a slow seek speed. Especially, in the case where the entity dataitem of an audio sample or the like that is synchronized with the “”i”thrandomly-accessible video sample is stored in a place such as adifferent packet away from the entity data of the video sample,additional seek operation becomes necessary and it is impossible toperform an immediate random access playback.

The present invention is conceived considering these problems, and anobject of the present invention is to provide a multiplexer which has ahigh efficiency of data access at the time of playing back a multiplexedmedia data file and which can multiplex media data so that thecalculation amount needed for searching samples can be reduced.

Also, another object is to provide a multiplexer which can multiplexmedia data so that an apparatus with a slow seek speed can performrandom access playback of a multiplexed file.

Further, another object is to obtain the file multiplexed by themultiplexer and provide a demultiplexer which can dmultiplex themultiplexed file.

DISCLOSURE OF INVENTION

In order to achieve the above-mentioned object, the multiplexer in thepresent invention generates multiplexed data by multiplexing packets ofmedia data including image data and at least one of audio data and textdata, comprising: a media data obtainment unit operable to obtain themedia data; an analysis unit operable to analyze the media data obtainedby the media data obtainment unit and obtain playback start timeinformation that indicates a playback start time of a sample that is asmallest access unit of the image data, audio data and text dataincluded in the media data; a packetization part determination unitoperable to determine, based on the playback start time informationobtained by the analysis unit, a packetization part of the media data ina way that playback start times of respective samples of the image data,audio data and text data that are included in the media data are made tobe the same or approximately the same; a packet header part generationunit operable to generate a packet header part that holds a header ofthe media data on a basis of the packetization part determined by thepacketization part determination unit; a packet data part generationunit operable to generate a packet data part that holds entity data ofthe media data on a basis of the packetization part determined by thepacketization part determination unit; and a packetization unit operableto generate a packet by connecting the packet header part generated bythe packet header part generation unit with the packet data partgenerated by the packet data part generation unit.

In this way, playback start times of image data, audio data and textdata that are included in the media data become the same orapproximately the same and stored in the packet, which makes it possibleto improve the data access efficiency of the playback apparatus inplayback.

Also, in the multiplexer in the present invention, the image data isvideo data, and the analysis unit further analyzes the video dataobtained by the media data obtainment unit and obtains intra frameinformation in the case where the video data includes at least onesample including the intra frame information indicating that the sampleis an intra coded sample, the packetization part determination unitdetermines the media data as the packetization part based on the intraframe information and the playback start time information in the casewhere the analysis unit obtains the intra frame information andpreferably place the sample of the video data including the intra frameinformation in the leading part of the paketization part.

In this way, as the leading video sample included in a packet becomesthe video sample of an intra frame, it is possible to widely reduce thecalculation amount needed for searching samples when the playbackapparatus performs random access.

Further, in the multiplexer in the present invention, the packet datapart generation unit preferably generates the packet data part forstoring samples of the media data items included in the packetizationpart by interleaving in a way that the playback start times of thesamples are in an ascending order.

In this way, as the playback start times of the video samples and theaudio samples are stored in mdat in an ascending order, it is possibleto reduce the number of seek operations when the playback apparatusperforms random access, which enables a playback apparatus with a slowseek speed can realize an immediate random access playback.

Note that the present invention can be realized not only as amultiplexer like this but also as a multiplexing method regarding thesecharacteristic units of the multiplexer like this as steps or as aprogram that causes a computer to execute these steps. After that, theprogram like this can be distributed via a recording medium such as aCD-ROM or a communication medium such as the Internet.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram for explaining the structures of boxes thatconstitute a conventional MP4 file;

FIG. 2 is a diagram for explaining the basic part of the conventionalMP4 file;

FIG. 3A is a diagram for explaining the structure of a movie box in theconventional MP4 file;

FIG. 3B is a tree-shaped diagram showing the structure of the movie boxin the conventional MP4 file;

FIG. 4 is a diagram showing the structure of the MP4 file including theconventional extension part;

FIG. 5 is a diagram for explaining the structure of the conventionalmovie fragment box;

FIG. 6 is a diagram for explaining the structure of a conventional trackfragment run box;

FIG. 7A is a diagram showing the first structural example of the MP4file including the conventional extension part;

FIG. 7B is a diagram showing the second structural example of the MP4file including the conventional extension part;

FIG. 8 is a block diagram showing the structure of the conventionalmultiplexer;

FIG. 9 is a flow chart showing the processing operation of aconventional packet unit determination unit;

FIG. 10 is a diagram showing an example of a packet generation tablethat stores a header information item of a conventional video sample;

FIG. 11A is a diagram for explaining the first problem of theconventional multiplexer;

FIG. 11B is a diagram for explaining the second problem of theconventional multiplexer;

FIG. 12 is a block diagram showing the functional structure of themultiplexer in the first invention of the present invention;

FIG. 13 is a flow chart showing the processing operation of themultiplexer;

FIG. 14 is a flow chart showing the processing operation of a videopacketization part determination unit;

FIG. 15 is a flow chart showing the processing operation of an audiopacketization part determination unit;

FIG. 16A is a diagram showing the first example of the data structure ofthe MP4 file extension part generated by the multiplexer;

FIG. 16B is a diagram showing the second example of the data structureof the MP4 file extension part generated by the multiplexer;

FIG. 17 is a block diagram showing the functional structure of thepacketization part determination unit of the multiplexer in a secondembodiment;

FIG. 18 is a flow chart showing the first processing operation of thevideo packetization part determination unit;

FIG. 19 is a flow chart showing the second processing operation of thevideo packetization part determination unit;

FIG. 20A is a diagram showing the first example of the data structure ofthe MP4 file extension unit generated by the multiplexer;

FIG. 20B is a diagram showing the second example of the data structureof the MP4 file extension unit generated by the multiplexer;

FIG. 21 is a block diagram showing the functional structure of thepacket data generation unit of the multiplexer in a third embodiment;

FIG. 22 is a flow chart showing the processing operation of the packetdata generation unit;

FIG. 23 is a diagram showing the outline of the data structure of theMP4 file extension part generated by the multiplexer;

FIG. 24 is a diagram showing the first example of the data structure ofthe MP4 file extension unit generated by the multiplexer;

FIG. 25 is a diagram showing the second example of the data structure ofthe MP4 file extension unit generated by the multiplexer;

FIG. 26 is a block diagram showing the functional structure of ademultiplexer in a fourth embodiment;

FIG. 27 is a flow chart showing the processing operation of thedemultiplexer; and

FIG. 28 is a diagram showing an application of the multiplexer in thepresent invention.

BEST MODE FOR CARRYING OUT THE INVENTION

Embodiments in the present invention will be explained with reference tofigures below.

Note that MPEG-4 Visual coded data is used as the video data in thisembodiment and the MPEG-4 Audio coded data is used as the audio data inthis embodiment. After that, this embodiment mainly explains theapparatus that multiplexes video data and audio data, but there is nointention of eliminating multiplexing other media data such as textdata.

First Embodiment

First, the multiplexer in the first embodiment of the present inventionwill be explained with reference to FIG. 12 to FIG. 16.

FIG. 12 is a block diagram showing the functional structure of themultiplexer in the first embodiment of the present invention.

This multiplexer 100 is an apparatus that generates the MP4 file thatcontains an extension part by multiplexing video data or audio data, andincludes the first input unit 101, the first data storage unit 102, thefirst analysis unit 103, the second input unit 104, the second datastorage unit 105, the second data analysis unit 106, a packetizationpart determination unit 107, a packet generation table storage unit 111,a packet header generation unit 112, a packet data generation unit 113and a packet connection unit 114.

The first input unit 101 is an interface that captures the coded videodata in the image coding apparatus or the like and puts it into themultiplexer 100, and has the first data storage unit 102 to store theobtained video input data items in sequence.

The first data storage unit 102 is a cache memory, a random accessmemory (RAM) or the like that temporally stores the video input data.

The first analysis unit 103 is a processing unit that reads out thevideo sample data that is the data of a single video sample among videoinput data items stored in the first data storage unit 102 and analyzesthese video input data items and outputs the header information of thevideo sample, and the first analysis unit 103 is realized in a form of aCPU or a memory. Note that the header information of the video sampleoutputted in this first analysis unit 103 includes the size of a videosample, the playback duration and the information indicating whether itis an intra frame or not. Further, the header information of this videosample includes the differential information between the decoding timeand the display time in the case where it is the sample using interprediction.

The second input unit 104 is an interface that captures the coded audiodata in the audio coding apparatus or the like and puts the coded audiodata into the multiplexer 100, and it has the second data storage unit105 to store the obtained audio input data items in sequence.

The second data storage unit 105 is a cache memory, RAM or the like thattemporally stores the audio input data.

The second analysis unit 106 is a processing unit that reads out theaudio sample data that is the data of a single audio sample among audioinput data items stored in the second data storage unit 105 and analyzesthese audio input data items and outputs the header information of theaudio sample, and the second analysis unit 106 is realized in a form ofa CPU or a memory. Note that the header information of the audio sampleoutputted in this second analysis unit 106 includes the size of an audiosample and the information indicating the playback duration.

The packetization part determination unit 107 is a processing unit thatdetermines the packetization part of the video data and the audio datain a way that the playback start time of the video sample included in apacket becomes the same or approximately the same as the playback starttime of the audio sample by integrating video sample included in thepacket and the header information of the audio sample, and is realizedin a form of a CPU or a memory. Also, the packetization partdetermination unit 107 outputs the collection of sample headerinformation items for a determined packetization part to the packetgeneration table storage unit 111 as a packet generation table, andoutputs, to the packet header generation unit.112, a packet generationsignal that instructs generating packet headers after the packetizationpart is determined. After that, this packetization part determinationunit 107 includes a time adjustment unit 108 that adjusts packetizationparts by the total duration of samples in a packet, a videopacketization part determination unit 109 that determines packetizationpackets of video data and an audio packetization part determination unit110 that determines packetization p arts of audio data.

The time adjustment unit 108 is a processing unit that adjusts the endtime of a packet so that the packet finishes within a predetermined timeunit. This time adjustment unit 108 outputs a predetermined time (targettime) to the video packetization part determination unit 109 first. Notethat a user may specify this target time. In this case, the multiplexer100 obtains a specification of the target time via the input apparatussuch as keyboards, and outputs a target time input signal showing thetarget time specified via the input apparatus to the time adjustmentunit 108.

The video packetization part determination unit 109 is a processing unitthat obtains the video sample header information from the first analysisunit 103 and determines the packetization part of the video data.

This video packetization part determination unit 109 obtains a targettime from the time adjustment unit 108, a video sample headerinformation from the first analysis unit 103, and adds headerinformation items up to the header information item of the last videosample included in a packet counting the playback duration of each videosample included in each of the video sample header information items ina way that the video data finishes in a packet within the target time.The video packetization part determination unit 109 adds the headerinformation item of the last video sample included in the packet andoutputs the video sample playback time information showing the total ofthe playback start time of the first video sample included in the packetand the playback duration of the video sample included in the packet tothe audio packetization part determination unit 110.

The audio packetization part determination unit 110 is a processing unitthat obtains the audio sample header information obtained from thesecond analysis unit 106 and determines the packetization part of theaudio data.

This audio packetization part determination unit 110 obtains the videosample playback time information from the video packetization partdetermination unit 109 and the audio sample header information from thesecond analysis unit 106, places, at the leading part of the packet, theaudio sample of the playback start time that is the same orapproximately the same as the playback start time of the leading videosample included in the packet, and places the last audio sample includedin the packet so that the total of the playback durations of the audiosamples included in the packet becomes the same or approximately thesame as the total of the playback durations of the video samplesincluded in the packet counting the playback duration of each audiosample included in each audio sample header information item.

Note that an audio sample of the playback start time that is the same orapproximately the same as the playback start time of the video samplehere is the audio sample of the earliest playback start time after theplayback start time of the video data or the audio sample of the lastplayback start time before the playback start time of the video sample.

After that, the audio packetization part determination unit 110 addsheader information items of the audio samples from the leading audiosample to the last audio sample included in the packet to the audiopacket generation table in sequence.

The packet generation table storage unit 111 is a cache memory, a RAM orthe like that temporally stores a video packet generation table and anaudio packet generation table that are outputted from the packetizationpart determination unit 107.

The packet header generation unit 112 is the processing unit thatgenerates a packet header part (moof) that stores a header informationitem of a packet, and is realized as a CPU or a memory.

This packet header generation unit 112 obtains a packet generationsignal from the packetization part determination unit 107, reading outpacket sample header information from the packet generation tablestorage unit 111 referring to the packet generation table so as togenerate moof data and outputs it to the packet connection unit 114.

Also, the packet header generation unit 112 outputs, to the packet datageneration unit 113, pointer information showing at which part of thefirst data storage unit 102 and the second data storage unit 105 theentity data items of the video samples included in a packet and theaudio sample are stored, sample size information showing the size of thesample, the mdat information including a signal that instructsgenerating a packet data unit (mdat).

Note that, this packet header generation unit 112 can store headerinformation items of media data coded using a coding method such asadvanced multi rate CODEC (AMR) where the coded rate is switched in themiddle of the data in a different traf depending on a coded rate at thetime of generating moof.

The packet data generation unit 113 is the processing unit thatgenerates a packet data part (mdat) where the entity data of a packet isstored and realized as a CPU or a memory.

This packet data generation unit 113 obtains the mdat information fromthe packet header generation unit 112, reads out the video entity dataof the video sample included in a packet from the first data storageunit 102 based on the pointer information included in mdat informationand the sample size information, reads out the audio entity data of theaudio sample included in a packet from the second data storage unit 105so as to generate mdat data and outputs the packet connection unit 114.

The packet connection unit 114 is the processing unit that connects moofdata with mdat data and generates mp4 extension data for a single packetand realized as a CPU or a memory. This packet connection unit 114obtains moof data from the packet header generation unit 112, obtainsmdat data from the packet data generation unit 113, generates the mp4extension data for a single packet by connecting moof data with mdatdata, and outputs mp4 extension unit data items that are generated insequence to the apparatus that generates the MP4 file.

The processing procedure of generating an extension unit of an MP4 filein the multiplexer 100 constituted like this will be explained withreference to FIG. 13.

FIG. 13 is a flow chart showing the processing operation of themultiplexer 100.

First, the first input unit 101 and the second input unit 104 read videodata and audio data in the multiplexer 100 (S100), the first input unit101 causes the first data storage unit 102 to store the video inputdata, and the second input unit 104 causes the second data storage unit105 store the audio input data.

Next, the first analysis unit 103 reads out the video sample data fromthe first data storage unit 102 so as to analyze it and outputs thevideo sample header information to the video packetization partdetermination unit 109 of the packetization part determination unit 107.After that, the video packetization part determination unit 109determines the packetization part of the video data based on the videosample header information obtained from the first analysis unit 103 andthe target time obtained from the time adjustment unit 108 (S110). Notethat the processing operation of determining the packetization part ofthe video data by the video packetization part determination unit 109will be explained in detail later.

After that, the video packetization part determination unit 109 outputsthe playback time information of the video sample included in the packetwhose packetization part is determined to the audio packetization partdetermination unit 110 (S120).

After that, the audio packetization part determination unit 110determines the packetization part of the audio data based on theplayback duration information of the video sample obtained from thevideo packetization part determination unit 109 (S130). At this time,the audio packetization part determination unit 110 determines thepacketization part so that the playback start time of the leading audiosample included in the packet becomes the same or approximately the sameas the playback start time of the leading video sample included in thepacket.

When the audio packetization part determination unit 110 determines thepacketization part of the audio data, the packetization partdetermination unit 107 outputs a packet generation table to the packetgeneration table storage unit 111 and outputs a packet generation signalto the packet header generation unit 112.

After that, the packet header generation unit 112 generates moof data ona basis of the determined part so as to output it to the packetconnection unit 114. The packet data generation unit 113 generates mdatdata on a basis of the determined part so as to output it to the packetconnection unit 114. The packet connection unit 114 connects moof datawith mdat data so as to generate a single packet on a basis of thedetermined part (S140) and outputs it as mp4 extension data for a singlepacket.

After generating the packet, the multiplexer 100 judges whether data tobe inputted is left in the first input unit 101 and the second inputunit 104 (S150). Here, in the case where there is input data (No inS150), the multiplexer 100 clears the data that has already beenpacketized among data stored in the buffer memory, that is the firstdata storage unit 102, the second data storage unit 105 and the packetgeneration table storage unit 111 (S160) and repeats the processingoperation from the above-mentioned S110 to S150.

On the other hand, in the case where there is no input data (Yes inS150), the multiplexer 100 finishes the generation processing of theextension part of the MP4 file.

In this way, the multiplexer 100 determines the packetization part ofthe audio data after determining the packetization part of the videodata first and generates the extension part of the MP4 file bymultiplexing the media data.

Here, in the step S110 in FIG. 13, the processing operation ofdetermining the packetization part of the video data by the videopacketization part determination unit 109 will be explained in detail.

FIG. 14 is a flow chart showing the processing operation of the videopacketization part determination unit 109.

The video packetization part determination unit 109 obtains the targettime from the time adjustment unit 108 prior to this flow.

After that, the video packetization part determination unit 109 obtainsthe video sample header information from the first analysis unit 103(S111) and adds the video sample header information to the video packetgeneration table (S112).

At this time, the video packetization part determination unit 109 judgeswhether the total of the playback durations of the video samplesincluded in the video sample header information items, that is, thetotal playback durations of the video data included in the packetbecomes the previously obtained target time or exceeds the target timeor not (S113).

In the case where the total playback durations of the video dataincluded in the packet does not reach the target time (No in S113), thevideo packetization part determination unit 109 obtains next videosample header information (S111) and repeats the processing operationsof S112 and S113.

In the case where the total playback durations of the video dataincluded in the packet reaches the target time (Yes in S113), the videopacketization part determination unit 109 determines the video sampleindicated by the video sample header information that is added to thevideo packet generation table last as the last video sample included inthe packet (S114) and finishes the processing operation of determining apacketization part.

Next, in the step S130 in FIG. 13, the processing operation ofdetermining the packetization part of the audio data by the audiopacketization part determination unit 110 will be explained in detail.

FIG. 15 is a flow chart showing the operation processing of the audiopacketization part determination unit 110.

The audio packetization part determination unit 110 obtains the videosample playback information from the video packetization partdetermination unit 109 prior to this flow.

After that, the audio packetization part determination unit 110 obtainsthe audio sample header information from the second analysis unit 106(S131), refers to the video sample playback duration information that ispreviously obtained (S132), reads out the playback start time of theleading video sample included in the packet and determines the audiosample of the playback start time that is the same or approximately thesame as the playback start time of the leading video sample included inthe packet as the audio leading sample of the packet (S133).

The audio packetization part determination unit 110 determines the audioleading sample included in the packet, obtains the audio sample headerinformation items in sequence (S134) and adds the audio sample headerinformation items to the audio packet generation table (S135).

After that, the audio packetization part determination unit 110 readsout the total of the playback durations of the video samples included inthe packet by referring to the video sample playback durationinformation (S136), determines the last audio sample included in thepacket so that the total of the playback durations of the audio samplesincluded in the packet becomes the same or approximately the same as thetotal of the playback durations of the video samples included in thepacket (S137) and finishes the processing operation of determining thepacketization part.

The extension part of the MP4 file to be generated through theprocessing operation by the multiplexer 100 like this has excellent dataaccess efficiency at a playback apparatus. The reason will be explainedwith reference to a data structure example of the MP4 file extensionpart to be generated by the multiplexer 100 in FIG. 16.

The MP4 file extension part 200 shown in FIG. 16A is made of a pluralityof packets and connected to the basic part of the MP4 file.

Each of the packets that constitute the MP4 file extension part 200 ismade of moof of the packet header part and mdat of the packet data part.Here, the packet_1 means that it is the first packet of the MP4 fileextension part 200, moof included in packet_1 is shown as moof_1, andmdat included in packet_1 is shown as mdat_1. Also, “V” shown in eachmdat of FIG. 16A is for indicating a video sample, while “A” shown ineach mdat of FIG. 16A is for indicating an audio sample (the same istrue of in other figures).

The video sample whose playback start time is 20 seconds is stored inmdat_1 of the MP4 file extension part 200 as a leading video sample, andalso the audio sample whose playback start time is 20 minutes is storedin mdat_1 of the MP4 file extension part 200 as an leading audio sample.Also, a video sample whose playback start time is 30 minutes is storedin mdat_2 as a leading video sample, and also an audio sample whoseplayback start time is 30 minutes is stored in mdat_2 as a leading audiosample.

In this way, storing a video sample and an audio sample in a singlepacket in a way that their playback start times are made to be the sameor approximately the same as each other makes it possible to widelyreduce the calculation amount needed for data access at the time ofplaying back the MP4 file extension part 200 at the playback apparatusside.

Also, as playback start times of each media data are stored in a packetafter they are made to be the same or approximately the same as eachother, it is possible to adjust the size of the MP4 file data to adesired size.

Here, the MP4 file extension part generated by the multiplexer 100 maybe the data structure shown in FIG. 16B.

FIG. 16B is a diagram showing the second example of the data structureof the MP4 file extension part generated by the multiplexer 100.

A video sample whose playback start time is 20 minutes is stored in themdat_1 of the MP4 file extension part 210 shown in FIG. 16B as a leadingvideo sample, and an audio sample whose playback start time is 20minutes is stored in mdat_2 as a leading audio sample. Also, a videosample whose playback start time is 30 seconds is stored in mdat_3 as aleading video sample, and an audio sample whose playback start time is30 minutes is stored in mdat_4 as a leading audio sample.

In this way, storing one of a video data or an audio data in a singlepacket and alternately arranging a packet storing video data items and apacket storing audio data items whose playback times are made to be thesame or approximately the same as each other can widely reduce thecalculation amount needed for data access at the time of playing backthe MP4 file extension part 200 at the playback apparatus.

As explained up to this point, the multiplexer 100 in this firstembodiment can improve the efficiency of data access at the playbackapparatus side because respective media data items are packetized aftertheir playback start times are made to be the same or approximately thesame as each other.

Second Embodiment

Next, the multiplexer in this second embodiment of the present inventionwill be explained with reference to FIG. 17 to FIG. 20.

The multiplexer in the second embodiment has the same main units as themultiplexer 100 in the above-mentioned first embodiment, but it differsfrom the multiplexer 100 in the above-mentioned first embodiment in thatit has a unique unit in a packetization part determination unit. Thisdifferent point will be focused on in the following explanation. Notethat the same codes are used for the same units as in theabove-mentioned first embodiment and their explanation will be omitted.

FIG. 17 is a block diagram showing the functional structure of thepacketization part determination unit of a multiplexer in the secondembodiment.

This packetization part determination unit 117 is the processing unitthat integrates the video sample included in a packet and headerinformation of an audio sample and determines a packetization part ofthe video data and the audio data in a way that playback start times aremade to be the same or approximately the same as each other and theleading video sample included in a packet becomes an intra frame, andincludes a time adjustment unit 108, a video packetization partdetermination unit 119 and an audio packetization part determinationunit 110.

The video packetization part determination unit 119 is the processingunit that obtains video sample header information from the firstanalysis unit 103 and determines a packetization part of video databased on either time or an intra frame, includes a time-based partadjustment unit 120 and an I frame-based part adjustment unit 121.

The time-based part adjustment unit 120 is the processing unit thatadjusts a packetization part of video data based on target timeoutputted from the time adjustment unit 108 and adjusts a packetizationpart in a way that a packet becomes a predetermined time unit bycounting playback durations of respective video sample headers.

The I frame-based part adjustment unit 121 is the processing unit thatadjusts a packetization part of video data based on whether theinformation indicating an intra frame is included in the video sampleheader information outputted from the first analysis unit 103. The Iframe-based part adjustment unit 121 obtains the video sample headerinformation that includes the information indicating an intra frame,switches packetization parts at a video sample of an intra frame, andadjusts the packetization part in a way that the leading video sample ofnext packet becomes the video sample of an intra frame.

The processing operation that determines a packetization part of videodata by the video packetization part determination unit 119 of themultiplexer in the second embodiment that includes a packetization partdetermination unit 117 constituted like this will be explained indetail.

FIG. 18 is a flow chart showing the processing operation of the videopacketization part determination unit 119.

The video packetization part determination unit 119 obtains target timefrom the time adjustment unit 108 and stores the time-based partadjustment unit 120 prior to this flow.

After that, likewise the above-mentioned first embodiment, the videopacketization part determination unit 119 obtains the video sampleheader information from the first analysis unit 103 (S201) and adds thevideo sample header information to the video packet generation table(S202).

At this time, the video packetization part determination unit 119 judgeswhether the information indicating an intra frame is included in theobtained video sample header information in the I frame-based partadjustment unit 121 (S203).

In the case where the information indicating an intra frame is included(Yes in S203), the video packetization part determination unit 119judges whether the total playback durations of all the video samplesincluded in a packet exceeds the previously obtained target time in thetime-based part adjustment unit 120 (S205).

Here, in the case where no information indicating an intra frame isincluded (No in S203) or in the case where the total durations do notexceed the target time (No in S205), the video packetization partdetermination unit 119 updates the total of the playback durations ofvideo samples included in the packet by adding the playback duration ofthe video sample included in the video sample header information in thetime-based part adjustment unit 120 (S204), obtains next video sampleheader information (S201) and repeats the above-mentioned processingoperation.

On the other hand, in the case where the total duration exceeds thetarget time (Yes in S205), the video packetization part determinationunit 119 determines the video sample immediately before the video samplejudged as an intra frame in the I frame-based part adjustment unit 121as the last video sample included in the packet (S206) and finishes theprocessing operation of determining a packetization part of video data.

In the extension part of the MP4 file generated through the processingoperation of the video packetization part determination unit 119 likethis, playback can be started from the leading video sample of a packetat the time of random access at a playback apparatus side because thevideo sample stored in the leading part of the packet surely becomes avideo sample of an intra frame, and thus it is possible to widely reducethe calculation amount needed for searching a randomly-accessible videosample.

Also, as the video sample stored in the leading part of the packetsurely becomes the video sample of an intra frame, only the informationindicating as randomly accessible must be described only in the leadingsample flag field of trun that is located in the leading part of trafthat holds header information of a video track in the packet header part(moof) and respective sample flag fields of the respective trun can beomitted by using default values, and thus the workload at the time ofgenerating moof data is reduced and the size of the whole MP4 file canalso be reduced.

Note that the playback duration per a single packet may be long when thespace between intra frames included in the video data becomes wide inthis processing operation. Therefore, the packetization partdetermination unit 117 may be the processing operation like describedbelow.

FIG. 19 is a flow chart showing the second processing operation of thevideo packetization part determination unit 119.

Likewise the above-mentioned first processing operation, the videopacketization part determination unit 119 obtains target time from thetime adjustment unit 108 and stores it in the time-based part adjustmentunit 120 prior to this flow.

After that, the video packetization part determination unit 119 obtainsthe video sample header information from the first analysis unit 103(S211) and adds the video sample header information to the video packetgeneration table (S212).

At this time, the video packetization part determination unit 119 judgeswhether the total playback time of all the samples included in thepacket exceeds the target time that is previously obtained in thetime-based part adjustment unit 120 (S213).

In the case where the total time exceeds the target time (Yes in S213),the video packetization part determination unit 119 determines the videosample indicated by the video sample header information that isimmediately before the video sample header information obtained thistime as the last video sample included in the packet (S214) and finishesthe processing operation of determining the packetization part of thevideo data.

On the other hand, in the case where the total time does not exceed thetarget time (No in S213), the video packetization part determinationunit 119 judges whether the information indicating an intra frame isincluded in the obtained video sample header information in the Iframe-based part adjustment unit 121 or not (S215).

Here, in the case where the information indicating an intra frame isincluded (Yes in S215), the video packetization part determination unit119 determines, as the last video sample included in the packet, thevideo sample that is immediately before the video sample that is judgedas an intra frame in the I frame-based part adjustment unit 121 (S214)and finishes the processing operation of determining the packetizationpart of video data.

On the other hand, in the case where no information indicating as anintra frame is included (No in S215), the video packetization partdetermination unit 119 updates the total of playback durations of videosample included in the packet by adding playback durations of videosamples included in the video sample header information in thetime-based part adjustment unit 120 (S216), obtains next video sampleheader information (S211) and repeats the above-mentioned processingoperation.

The extension part of the MP4 file generated through the secondprocessing operation of the video packetization part determination unit119 like this can generate packets setting a predetermined time limit soas to keep the packet size within the desired size and, in the casewhere video samples of intra frames are included, store them in theleading part of the packets, which only requires judging whether onlythe leading video sample of the packet is the randomly-accessible videosample or not at the time of random access at a playback apparatus side,and thus it becomes possible to reduce the calculation amount needed forsearching randomly-accessible video samples.

Note that, like in the case of the above-mentioned first embodiment, thevideo packetization part determination unit 119 finishes the processingoperation of determining a packetization part of video data, outputs thevideo sample playback time information to the audio packetization partdetermination unit 110 and the processing operation of determining apacketization part of audio data is performed in the audio packetizationpart 110.

The extension part of the MP4 file generated through the processingoperation by the packetization part determination unit 117 like thisreduces the searching workload at the time of random access in aplayback apparatus. The reason will be explained with reference to adata structure example of the MP4 file extension part generated by themultiplexer in the second embodiment in FIG. 20.

In the mdat_1 of the MP4 file extension unit 220 shown in FIG. 20A, thevideo sample of an intra frame is stored as a leading video sample, anda video sample of an intra frame is stored in also mdat_2 as a leadingvideo sample.

In this way, storing a video sample of an intra frame in the packet as aleading video sample makes it suffice to search only the leading videosample in the packet in order to obtain a randomly-accessible videosample at the time of random access at the playback apparatus side,which eliminates the necessity of searching all the video samplesincluded in the packet, and thus it is possible to widely reduce theworkload in searching samples at the time of random access.

Also, at this time, describing the information indicating as randomlyaccessible in only the leading sample flag field of trun located in theleading part of traf that stores header information of the video trackin moof_1 and moof_2 of the MP4 file extension part 220 makes itpossible to reduce the size of moof_1 and moof_2.

Here, the extension part of the MP4 file generated by the multiplexer inthe second embodiment may be the data structure shown in FIG. 20B.

The video sample of an intra frame is stored in mdat_1 of the MP4 fileextension part 230 shown in FIG. 20B as a leading video sample, and avideo sample of an intra frame is stored also in mdat_3 as a leadingvideo sample. Also, audio samples are stored in mdat_2 and mdat_4.

In this way, storing one of video data and audio data in a single packetand storing a video sample of an intra frame in the packet that storesthe video data as a leading video sample makes it possible to widelyreduce the workload in searching samples at the time of random access atthe playback apparatus side.

Note that, in any of data structure examples of these MP4 file extensionpart, making the playback start time of the leading video sample storedin the packet the same or approximately the same as the playback starttime of the leading audio sample makes it possible to widely reduce thecalculation amount needed for data access at the playback apparatus.

As explained up to this point, with the multiplexer in this secondembodiment, it is possible to reduce the calculation amount needed forsearching samples at the time of random access at the playback apparatusbecause it generates packets in a way that a randomly-accessible videosample is made to be the leading video sample.

Third Embodiment

Further, the multiplexer in the third embodiment of the presentinvention will be explained with reference to FIG. 21 to FIG. 25.

The multiplexer in the third embodiment has the same main units as themultiplexers in the above-mentioned first and second embodiments, but itdiffers from the multiplexers in the above-mentioned first and secondembodiments in that it has a unique unit in the packet data generationunit. This different point will be focused on in the followingexplanation. Note that the same codes are used for the same units as theabove-mentioned first and second embodiments and explanations on themwill be omitted.

FIG. 21 is a block diagram showing the functional structure of thepacket data generation unit of the multiplexer in the third embodiment.

This packet data generation unit 130 is the processing unit thatgenerates a packet data unit (mdat) by interleaving and storing theentity data of the video sample and the entity data of the audio sample,and includes a mdat information obtainment unit 131, a video entity datareading out unit 132, an audio entity data reading out unit 133 and aninterleave arrangement unit 134.

The mdat information obtainment unit 131 is the processing unit thatobtains the mdat information from the packet header generation unit 112and outputs the read instruction of the entity data or the playback timeinformation to other units that constitutes the packet data generationunit 130.

This mdat information obtainment unit 131 obtains mdat information fromthe packet header generation unit 112, analyzes the mdat information,obtains the playback time information indicating the playback starttimes and the playback end times of the video samples and audio samples,and rearranges them based on this playback time information in a waythat playback start times of all video samples and audio samplesincluded in the packet are in an ascending order.

After that, the mdat information obtainment unit 131 outputs, to thevideo entity data reading unit 132, the video read instruction thatinstructs reading out the entity data of the video sample, or itoutputs, to the audio entity data reading unit 133, audio readinstruction that instructs reading out the entity data of audio samplestarting from the sample whose playback starting time is earliestaccording to the rearranged order. This video read instruction includespointer information indicating which part of the first data storage unit102 the entity data of the video sample is stored and the sizeinformation of the video sample, and the audio read instruction includespointer information indicating which part of the second data storageunit 105 the entity data of the audio sample is stored and the sizeinformation of the audio sample.

The video entity data reading unit 132 is the processing unit thatobtains the video read instruction from mdat information obtainment unit131 and reads out the video entity data from the first data storage unit102. The video entity data reading unit 132 reads out the video entitydata from the first data storage unit 102 with reference to the pointerinformation included in the video read instruction and the sizeinformation and outputs the read video entity data to the interleavearrangement unit 134.

The audio entity data reading unit 133 is the processing unit thatobtains the audio read instruction from mdat information obtainment unit131 and reads out the audio entity data from the second data storageunit 105. This audio entity data reading unit 133 reads out audio entitydata from the second data storage unit 105 with reference to the pointerinformation included in the audio read instruction and the sizeinformation and outputs the read audio entity data to the interleavearrangement unit 134.

The interleave arrangement unit 134 is the processing unit that obtainsthe read video data and the read audio data that are outputted from thevideo entity data reading unit 132 and the audio entity data readingunit 133 in output order, generates mdat data by interleaving andarranging them, and outputs them to the packet connection unit 114.

The processing operation of generating mdat by the packet datageneration unit 130 in the multiplexer in the third embodiment that hasthe packet data generation unit 130 constituted like this will beexplained in detail.

FIG. 22 is a flow chart showing the processing operation of the packetdata generation unit 130.

First, the packet data generation unit 130 obtains mdat information fromthe packet header generation unit 112 in mdat information obtainmentunit 131 (S301). The mdat information obtainment unit 131 analyzes theobtained mdat information and extracts sample pointer information,sample size information and sample playback time information. Afterthat, the mdat information obtainment unit 131 rearranges all the videosamples and the audio samples included in a packet based on theextracted sample playback time information in a way that these playbackstart times are in an ascending order. Consequently, the mdatinformation obtainment unit 131 outputs the video read instructionincluding the pointer information and the size information of theextracted video sample to the video entity data reading unit 132starting from the sample whose playback staring time is earliestaccording to the rearranged order, or outputs the audio read instructionincluding the pointer information and the size information of theextracted audio sample to the audio entity data reading unit 133.

The video entity data reading unit 132 obtains the video readinstruction, reads out the video entity data from the first data storageunit 102 with reference to the pointer information and the sizeinformation so as to output it to the interleave arrangement unit 134.The audio entity data reading unit 133 obtains the audio readinstruction, reads out the audio entity data from the second datastorage unit 105 with reference to the pointer information and the sizeinformation so as to output them to the interleave arrangement unit 134(S302).

The interleave arrangement unit 134 receives the read entity data fromthe video entity data reading unit 132 and the audio entity data readingunit 133 and arranges them in the receiving order (S303).

Here, the interleave arrangement unit 134 continue arranging the entitydata items until all the entity data items, that is, all the videoentity data items and the audio entity data items, stored in a singlepacket have been completed (No in S304 and S303).

After that, when all the entity data items stored in a single packethave been arranged (Yes in S304), the interleave arrangement unit 134outputs the arranged entity data to the packet connection unit 114 asmdat data (S305) so as to finish the processing operation of generatingmdat.

The extension unit of the MP4 file generated via the processingoperation of the packet data generation unit 130 like this is suitablefor the random access playback in an optical apparatus or the like thatrequires a lot of seek time. The reason will be explained indicating theoutline of the data structure of the MP4 file extension part generatedby the multiplexer in the third embodiment in FIG. 23.

The MP4 file extension unit 240 shown in FIG. 23 is made of a pluralityof arranged packets: packet 1 that stores 4 to 8 second content data;packet 2 that stores 8 to 12 second content data; and packet 3 thatstores 12 to 16 second content data.

Each packet is made of moof 241 and mdat 242, and in the moof 241, tfhd(V) and traf (V-1, V-2) concerning a video track and tfhd(A) and traf(A-1, A-2) concerning an audio track are stored. Also, the entity dataof a sample indicated by the header information stored in traf (V-1) andtraf (A-1) is stored in mdat_l, and the entity data of a sampleindicated by the header information stored in traf (V-2) and traf (A-2)are stored in mdata_2. In addition, in mdat 242, the entity data of avideo sample and the entity data of an audio sample are stored in a waythat they are alternately interleaved.

At this time, moving the reading pointer to the leading position ofmoof_1 at the time of random access processing that starts playback fromthe position of 4 second in playback time at a playback apparatus side,analyzing moof_1, and moving the read pointer in sequence makes itpossible to obtain the entity data necessary for playback from mdat_1that is next to moof_1.

In other words, with this MP4 file extension unit 240, the playbackapparatus can realize random access playback by a single seek operationthat moves the read pointer to the leading position of moof_1, and thusthe apparatus is effective for an optical disc apparatus or the likethat requires a lot of time to seek.

Here, in mdat 242, the playback start time of the entity data of anaudio sample stored immediately after the entity data of a video sampleis made to be the same or approximately the same as the playback starttime of the immediately-before video sample, the synchronous playback ofvideo data and audio data are secured. FIG. 24 shows how the entity datais stored in mdat_1 of the MP4 file extension part 240.

As shown in FIG. 24, the playback start time of the video sample 1stored in the leading part of mdat_1 is 4000 ms, the playback start timeof the audio sample 1 stored immediately after the video sample 1 is4000 ms, and the playback start time of the video sample 1 and the audiosample 1 are made to be the same or approximately the same as eachother.

Generally, the sample rate of a video sample differs from the samplerate of an audio sample in most cases, here, the playback duration of avideo sample is 500 ms, and the playback duration of an audio sample is100 ms.

Therefore, in mdat_1 of the MP4 file extension part 240, audio samples 1to 5 are interleaved and stored immediately after the video sample 1,and after them, video sample 2, audio samples 6 to 10 and video sample 3are stored in sequence.

At this time, the playback start time of the video sample 2 is 4500 ms,the playback start time of the audio sample 6 stored immediately afterthe video sample 2 is 4500 ms, and the playback start time of the videosample and the playback start time of the audio sample that isimmediately after the video sample is made to be constantly the same orapproximately the same as each other.

Also, the sample rate of a video sample differs from the sample rate ofan audio sample, there may be a case where the playback start time ofthe video sample is not the same or approximately the same as theplayback start time of the audio sample that is immediately after thevideo sample. Even in this case, an audio sample whose playback time isthe same or approximately the same as the playback start time of thevideo sample is used for the audio sample immediately after the videosample, the synchronous playback of the video data and the audio dataare secured.

FIG. 25 is a diagram showing the second data structure indicating howthe entity data is stored in mdat_1 of the MP4 file extension part.

As shown in FIG. 25, the playback start time of the video sample 1stored in the leading part of mdat_1 of the MP4 file extension part 250is 4000 ms, the playback start time of the audio sample 1 storedimmediately after the video sample 1 is 4050 ms, and the audio sample 1that is placed after the playback start time of the video sample 1 andwhose playback start time is the earliest is stored as an audio samplestored immediately after the video sample 1.

Here, likewise the case that has already been explained, the playbackduration of a video sample is 500 ms, and the playback duration of anaudio sample is 100 ms.

Therefore, in mdat_1 of the MP4 file extension part 250, audio samples 1to 5 are interleaved and stored immediately after the video sample 1,after that, video sample 2, audio samples 6 to 10 and video sample 3 arestored in sequence.

At this time, the playback start time of the video sample 2 is 4500 ms,the playback start time of the audio sample 6 stored immediately afterthe video sample 2 is 4550 ms, and the playback start time of the videosample and playback start time of the audio sample immediately after thevideo sample are made to be the same or approximately the same as eachother.

Note that, as an audio sample stored immediately after the video samplehere, an audio sample which is located before the playback start time ofthe video sample and whose playback start time is the last may be storedas an audio sample stored immediately after the video sample. In thiscase, the playback time of the audio sample 1 stored immediately afterthe video sample 1 is 3950 ms.

As explained up to this point, with the multiplexer in the thirdembodiment, as an audio sample whose playback start time is the same orapproximately the same as the playback start time of the video sample isplaced immediately after the video sample, and video samples and audiosamples are interleaved and stored in mdat in a way that their playbackstart times are arranged in an ascending order, it is possible togenerate an MP4 file extension part which has a data structure thatenables an immediate random access even in a playback apparatus whichhas a slow seek speed. Further, video and audio samples can beinterleaved by the unit that consists of more than one samples.

Fourth Embodiment

Consequently, the demultiplexer in the fourth embodiment of the presentinvention will be explained with reference to FIG. 26 and FIG. 27.

FIG. 26 is a block diagram showing a functional structure of thedemultiplexer in the fourth embodiment.

The demultiplexer 300 is the apparatus that obtains and analyzes the MP4file data including the MP4 file extension part generated by themultiplexer in the above-mentioned first, second and third embodiments,demultiplexes the media data and outputs the playback data, and includesa file input unit 301, a file data storage unit 302, a headerdemultiplex analysis part 303, a moov analysis unit 304, a moof analysisunit 305, a traf analysis unit 306, a trun analysis unit 307, an RAsearching unit 308 and a sample obtainment unit 309.

The file input unit 301 is an interface that obtains an MP4 file dataand stores the input data items in the obtained MP4 file in the filedata storage unit 302 in this sequence.

The file data storage unit 302 is a cache memory, a RAM or the like thattemporally stores the MP4 input data.

The header demultiplex analysis unit 303 is the processing unit thatreads out and analyzes the header data in the MP4 file among the MP4input data items stored in the file data storage unit 302, demultiplexesmoov data of the basic part header in the MP4 file from moof data of theextension part and outputs them to the moov analysis unit 304 and themoof analysis part 305 respectively, and is realized in a form of a CPUor a memory.

The moov analysis unit 304 is the processing unit that analyzes moov ofthe MP4 file and obtains the media information necessary for analyzingthe media data such as the coding rate of the media data or the playbacktime of a content, and is realized in a form of a CPU or a memory. Thismoov analysis unit outputs the obtained media information to moofanalysis unit 305.

The moof analysis unit 305 is the processing unit that analyzes moof ofthe MP4 file based on the media information obtained from moov analysisunit 304 and outputs traf data that is the header data for each track totraf analysis unit 306, and is realized in a form of a CPU or a memory.

The traf analysis unit 306 is the processing unit that analyzes traf ofthe MP4 file and outputs trun data that is the header data for eachsample included in traf to trun analysis unit 307, and is realized in aform of a CPU or a memory.

The trun analysis unit 307 is the processing unit that analyzes trun ofthe MP4 file, obtains the information described in each field of trun,and outputs trun analysis information to the sample obtainment unit 309,and is realized in a form of a CPU or a memory. This trun analysisinformation includes, such as, a sample size, data offset informationindicating which part of the file data storage unit 302 the sample isstored, and in the case of a video sample, flag information indicatingwhether it is an intra frame or not and the like.

Also, on obtaining, from the RA searching unit 308 that is explainednext, a playback start instruction that shows a playback start positionafter random access and instructs the start of playback, this trunanalysis unit 307 analyzes truns starting from the trun shown by theplayback start instruction in this sequence and outputs trun analysisinformation to the sample obtainment unit 309.

The RA searching unit 308 is the processing unit that obtains a targetplayback time information showing the playback start time after randomaccess, reads out the leading sample information that is the informationindicating the playback start time of the leading sample included in theleading trun in the leading traf that stores the header informationconcerning the video track and searches the video sample that is theplayback start position after random access, and is realized in a formof a CPU or a memory. On obtaining the target playback time informationfrom the input apparatus of the demultiplexer 300 that receives a randomaccess instruction from a user, this RA searching unit 308 obtains onlythe leading sample information from trun analysis unit 307 in thissequence, searches a video sample whose playback start time is the sameor approximately the same as the target playback time information andoutputs a playback start instruction to trun analysis unit 307.

The sample obtainment unit 309 is the processing unit that reads out anddecodes the entity data of a sample based on trun analysis informationand outputs the playback data to a display apparatus such as a display.On obtaining trun analysis information from trun analysis unit 307, thissample obtainment unit 309 refers to data offset information included inthis and reads the entity data of a sample from the file data storageunit 302. Here, the start of obtaining trun analysis information meansthat the start of playback is instructed.

The operation of random access processing in the demultiplexer 300constituted like this will be explained with reference to FIG. 27.

FIG. 27 is a flow chart showing the operation of random accessprocessing of the demultiplexer 300. Note that the demultiplexer 300receives a random access instruction from a user via an input apparatusprior to this flow.

First, on obtaining data items of the MP4 file generated in themultiplexer in the above-mentioned first, second and third embodimentsin the file input unit 301 (S400), the demultiplexer 300 stores the dataitems in the file data storage unit 302 in this sequence.

Next, the demultiplexer 300 demultiplexes and analyzes only the fileheader part of the MP4 file in the header demultiplex and analysis unit303 (S410), further demultiplexes the basic part header from theextension part header, analyzes the basic part header in moov analysisunit 304 and analyzes the extension part in moof analysis unit 305(S420).

Consequently, the demultiplexer 300 further demultiplexes the extensionheader into headers for each track in moof analysis unit 305, andanalyzes the track fragment that is traf in traf analysis unit 306(S430). At this time, the demultiplexer 300 further demultiplexes thetrack fragment in traf analysis unit 306 and analyzes trun in trunanalysis unit 307.

Here, in response to the input of target playback time information in RAsearching unit 308, the demultiplexer 300 outputs the leading sampleinformation from the trun analysis unit 307 to the RA searching unit 308and judges whether it is the leading sample information whose playbackstart time is the same or approximately the same as the one shown by thetarget playback time information or not in RA searching unit 308 (S440).

At this time, in the case where no target sample is found (No in S450),the demultiplexer 300 obtains leading sample information in theextension part header that is located next in storing sequence in a filein the RA searching unit 308 and judges whether it is the leading sampleinformation whose playback start time is the same or approximately thesame as the target playback time information that has already beenobtained or not (S440).

On the other hand, in the case where a target sample is found (Yes inS450), the demultiplexer 300 generates a playback start instruction inthe RA searching unit 308 and outputs it to trun analysis unit 307. Onreceiving a playback start instruction from the RA searching unit 308,the trun analysis unit 307 outputs trun analysis information to thesample obtainment unit 309 starting from the trun to which a playbackstart instruction is given. Here, the trun to which a playback startinstruction is given indicates the trun including a sample for whichplayback start is indicated in the RA searching unit 308.

After that, the demultiplexer 300 refers to the data offset informationincluded in trun analysis information in the sample obtainment unit 309,obtains the entity data of the target sample from the file data storageunit 302 (S460), decodes the data, outputs the playback data so as tofinish the operation of random access processing.

As explained up to this point, with the demultiplexer 300 in fourthembodiment, searching only a video sample stored in the leading part ofeach packet at the time of performing random access playback in the MP4file including the MP4 file extension unit generated by the multiplexerin the above-mentioned first, second and third embodiments makes itpossible to judge the video sample that should be the playback startposition after random access, and thus the workload in searching samplesat the time of random access is widely reduced.

Application

Here, an application of the multiplexer in the present invention will beexplained with reference to FIG. 28.

FIG. 28 is a diagram showing an application of the multiplexer in thepresent invention.

The multiplexer in the present invention may be applied for a mobiletelephone with a recording function or a personal computer 404 thatobtains and multiplexes media data such as video data, audio data or thelike and generates MP4 file data. Also, the demultiplexer in the presentinvention may be applied for a mobile telephone 407 that reads thegenerated MP4 file data and plays it back.

Here, the MP4 file data generated in the mobile telephone with arecording function 403 and the personal computer 404 are stored in arecording medium such as an SD memory card 405, a DVD-RAM 406 or thelike or sent to the image distribution server 401 via the communicationnetwork 402 so as to be distributed from the image distribution server401 to the mobile telephone 407 or the like.

In this way, the multiplexer and the demultiplexer in the presentinvention are used for an MP4 file generation apparatus or a playbackapparatus in the image distribution system or the like.

Up to this point, the multiplexer and the demultiplexer in the presentinvention have already been explained based on the respectiveembodiments and the like, this present invention is not limited to theseembodiments and the like.

For example, coded data of MPEG-4 visual is used as video data in theabove-mentioned embodiments, but coded data on which other videocompression coding method such as MPEG-4 advanced video coding (AVC), H.263 or the like may be used. Note that a single picture corresponds to asingle sample in the coded data of MPEG-4 Advanced video coding (AVC) orH. 263.

Likewise, coded data of MPEG-4 audio is used as audio data, but codeddata on which other audio compression coding method such as G. 726 maybe used as audio data.

Also, in the explanation made in the above-mentioned embodiment, videodata and audio data are used, but it is possible to obtain theefficiency of the present invention by processing the audio datalikewise the case of packetization even in the case where text data andthe like are included.

Further, in the above-mentioned second embodiment, it is possible toomit the time-based part adjustment unit 120 from the units of thepacketization part determination unit 117 and omits processing of stepS205 in FIG. 18 in the case where packetization is performed for eachintra frame.

Also, in the above-mentioned third embodiment, in the case where the MP4file is played back according to the buffer model that is previously setat a playback apparatus of the MP4 file, video sample data and audiosample data are interleaved and stored in mdat so that the buffer modelis satisfied. Here, a buffer model is a model for guaranteeing that aplayback apparatus can perform decoding preventing the buffer frombecoming empty (underflow) or preventing data from overflowing thebuffer (overflow) by causing the playback apparatus to have a bufferwhose size is prescribed in a standard in the case where coded data areinputted according to conditions prescribed in the standard.

Also, in the above-mentioned first, second and third embodiments, thenumber of trafs stored in moof of the extension part of the MP4 file tobe generated is not mentioned, but it is preferred that traf to bestored in moof stores a single traf per a single track. This makes itpossible to obtain header information of samples of all the tracks to bestored in moof by analyzing only leading traf in moof track by track,and thus the efficiency at the time of obtaining header informationfurther improves.

Further, in the above-mentioned first, second and third embodiments, theentity data of samples whose header information items are stored in moofof the extension part of the MP4 file to be generated are stored in asingle mdat next to moof, but it is possible to divide the sample into aplural mdat next to moof and store them. Specifically explaining, theentity data items of samples whose header information items may bestored in moof_1 are stored in mdat_1, mdat_2 and mdat_3 in thissequence and the entity data items of samples whose header informationitems may be stored in moof_2 are stored in mdat_4, mdat_5 and mdat_6 inthis sequence.

After that, in the case where an intra frame of video data is includedin the packet, the intra frame should be placed in the leading part ofthe packet in the above-mentioned second and third embodiments, but itis possible to place the video sample other than an intra frame such asa predictive (P) frame, a bidirectionally predictive (B) frame or thelike in the leading part of the packet on condition that they arerandomly accessible. This will be explained taking the case where codeddata of MPEG-4 AVC are used as video data below as an example.

In MPEG-4 AVC, there is a case where no right decoding result isobtained even in the case of decoding from an intra picture. Morespecifically, there are two types of intra pictures of MPEG-4 AVC: aninstantaneous decoder refresh (IDR) picture and other pictures (called anon-IDR intra picture). It is possible to always obtain a right decodingresult when starting decoding from an IDR picture, but right decodingresult may not be obtained in the case of a non-IDR intra picture and aplurality of pictures after the non-IDR intra picture in display order.

Therefore, in MPEG-4 AVC, it is possible to add recovery pointsupplemental enhancement information called “recovery point SEI”indicating from which picture decoding should be started in order toobtain a right decoding result from the non-IDR intra picture.

For example, five pictures indicated as Pic_1, Pic_2, Pic_3, Pic_4,Pic_5 are included in the video data in this sequence. When trying todecode Pic_5 and pictures after Pic_5 in display sequence on conditionthat Pic_5 is a non-IDR intra picture, in the case where the decodingmust be started from Pic_1, placing recovery point SEI at immediatelybefore Pic_1 makes it possible to indicate that the decoding must bestarted from Pic_1 in order to decode Pic_5 that is the picture placedfour pictures later in storage order in the video data and picturesafter the Pic_5 in display sequence.

In other words, Pic_1 is a randomly-accessible sample in this case, inthe case of coded data of MPEG-4 AVC, it is possible to place a sampleof the IDR picture or the picture to which recovery point SEI is addedin the leading part of the packet as a randomly-accessible sample.Further, random-accessible samples that do not have recovery point SEIcan be the leading sample in a packet. Note that the recovery point SEIcan be added to the picture other than an intra picture.

At this time, it is possible to reduce the processing amount at the timeof obtaining sample data by storing a sample of the picture to whichrecovery point SEI is added and a sample of the picture that can bedecoded right for the first time after starting decoding from thepicture to which recovery point SEI is added.

Further, it is possible to identify the IDR picture from the sample ofthe picture to which recovery point SEI is added based on the leadingsample flag 930 or a specific flag value in the sample flag 935 (calledas nonsynchronous sample flag). In the MP4, it is possible to set, at 0,the nonsynchronous sample flag of only the sample, amongrandomly-accessible samples, on which random access is allowed is thesample which is correctly decoded. Therefore, it is possible to identifythe both by making the nonsynchronous sample flag as 0 in the sample ofthe IDR picture and making the nonsynchronous sample flag as 1 in thesample of the picture to which recovery point SEI is added.

By using an identification method like the above, it is possible todifferentiate randomly-accessible samples from each other based onnature. In reality, it can be used like below.

First case is to perform forwarding by playing back only specificsamples. At this time, as it is desirable that the decoded samples canimmediately be displayed, only samples whose nonsynchronous sample flagis 0 are decoded and played back.

Second case is to start playing back from the middle of the content ornext area by skipping specific areas. At this time, only in the case ofstarting playing back, the sample from which decoding is started maydiffer from the sample which is correctly decoded. Therefore, eitherfrom the sample whose nonsynchronous sample flag is 0 or from therandomly-accessible sample whose nonsynchronous sample flag is 1playback can be started.

Note that this storage method is not limited to the case of recoverypoint SEI of MPEG-AVC, it is applicable for the case where the samplefrom which decoding is started differs from the sample which iscorrectly decoded. For example, it can be applicable for the structuresuch as Open GOP (Group of Pictures) MPEG-2 video.

Further, in the case where identification information indicating thatthe sample is randomly accessible, it is possible to place the sampleidentified as randomly accessible by the identification information inthe leading part of the packet.

INDUSTRIAL APPLICABILITY

The multiplexer in the present invention is suitable for a digital videocamera, a mobile phone with a recording function or the like thatgenerates an MP4 file data by obtaining media data such as video data oraudio data and stores it in a recording medium, or a personal computer,a PDA or the like that distributes the generated MP4 file data via theInternet, and the demultiplexer in the present invention is suitable fora personal computer, a mobile phone or the like that downloads and playsback the MP4 file data.

1. A multiplexer that generates multiplexed data by multiplexing packetsof media data including image data and at least one of audio data andtext data, comprising: a media data obtainment unit operable to obtainthe media data; an analysis unit operable to analyze the media dataobtained by the media data obtainment unit and obtain playback starttime information that indicates a playback start time of a sample thatis a smallest access unit of the image data, audio data and text dataincluded in the media data; a packetization part determination unitoperable to determine, based on the playback start time informationobtained by the analysis unit, a packetization part of the media data ina way that playback start times of respective samples of the image data,audio data and text data that are included in the media data are made tobe the same or apporoximately the same; a packet header part generationunit operable to generate a packet header part that holds a header ofthe media data on a basis of the packetization part determined by thepacketization part determination unit; a packet data part generationunit operable to generate a packet data part that holds entity data ofthe media data on a basis of the packetization part determined by thepacketization part determination unit; and a packetization unit operableto generate a packet by connecting the packet header part generated bythe packet header part generation unit with the packet data partgenerated by the packet data part generation unit.
 2. The multiplexeraccording to claim 1, wherein the packetization part determination unitmakes the playback start times of a sample of the audio data placed inthe leading part of the packetization part and a sample of the text datathe same or approximately the same as the playback start time of asample of the image data placed in the leading part of the packetizationpart.
 3. The multiplexer according to claim 2, wherein the packetizationpart determination unit determines a sample of the audio data and asample of the text data that are placed in the leading part of thepacketization part as a sample whose playback start time is after theplayback start time of a sample of the image data placed in the leadingpart of the packetization part and the earliest to the playback starttime of a sample of the image data.
 4. The multiplexer according toclaim 2, wherein the packetization part determination unit determines asample of the audio data and a sample of the text data that are placedin the leading part of the packetization part as a sample whose playbackstart time is before the playback start time of a sample of the imagedata placed in the leading part of the packetization part and theearliest to the playback start time of a sample of the image data. 5.The multiplexer according to claim 1, wherein the image data is videodata, the analysis unit further analyzes the video data obtained by themedia data obtainment unit and obtains intra frame information in thecase where the video data includes at least one sample including theintra frame information indicating that the sample is an intra codedsample, the packetization part determination unit determines the mediadata as the packetization part based on the intra frame information andthe playback start time information in the case where the analysis unitobtains the intra frame information.
 6. The multiplexer according toclaim 5, wherein the packetization part determination unit places asample of the video data including the intra frame information in theleading part of the packetization part.
 7. The multiplexer according toclaim 6, wherein the packetization part determination unit makesplayback start time of a sample of the video data including the intraframe information placed in the leading part of the packetization partthe same or approximately the same as the playback start time of asample of the audio data and a sample of the text data that are placedin the leading part of the packetization part.
 8. The multiplexeraccording to claim 1, wherein the packet data part generation unitgenerates the packet data part for storing samples of the media dataitems included in the packetization part by interleaving in a way thatthe playback start times of the samples are in an ascending order. 9.The multiplexer according to claim 8, wherein the packet data partgeneration unit generates the packet data part for storing samples ofthe media data items included in the packetization part by interleavingin a way that a previously set condition is satisfied.
 10. Amultiplexing method for generating multiplexed data by multiplexingpackets of media data including image data and at least one of audiodata and text data, comprising: a media data obtainment step ofobtaining the media data; an analyzing step of obtaining the playbackstart time information indicating playback start time of a sample thatis the smallest access unit of the image data, audio data and text dataincluded in the media data by analyzing the media data obtained in themedia data obtainment step; a packetization part determination step ofdetermining the packetization part of the media data making playbackstart times of respective samples of the image data, audio data and textdata included in the media data based on the playback start timeinformation obtained in the analysis step; a packet header partgeneration step of generating the packet header part that holds a headerof the media data on a basis of the packetization part determined in thepacketization part determination step; a packet data part generationstep of generating the packet data part that holds entity data of themedia data on a basis of the packetization part determined in thepacketization part determination step; and a packetization step ofgenerating a packet by connecting the packet header part generated inthe packet header part generation step to the packet data part generatedin the packet data part generation step.
 11. The multiplexer accordingto claim 10, wherein, in the packetization part determination step,playback start times of the audio data and the text data that are placedin the leading part of the packetization part is made to be the same orapproximately the same as the playback start time of a sample of theimage data placed in the leading part of the packetization part.
 12. Themultiplexing method according to claim 10, wherein the image data isvideo data, in the analysis step, further, the intra frame informationis obtained in the case where at least one sample including intra frameinformation indicating that the video data is an intra coded sample isincluded by analyzing the video data obtained in the media dataobtainment step, and in the packetization part determination step, thepacketization part of the media data is determined based on the intraframe information and the playback start time information in the casewhere the intra frame information is obtained in the analysis step. 13.The multiplexing method according to claim 12, wherein, in thepacketization part determination step, a sample of the video dataincluding the intra frame information is placed in the leading part ofthe packetization part.
 14. The multiplexing method according to claim13, wherein, in the packetization part determination step, playbackstart times of a sample of the audio data and a sample of the text datathat are placed in the leading part of the packetization part are madeto be the same or approximately the same as playback start time of asample of the video data including the intra frame information placed inthe leading part of the packetization part.
 15. The multiplexing methodaccording to claim 10, wherein, in the packet data part generation step,the packet data part for storing samples of the media data itemsincluded in the packetization part is generated by interleaving in a waythat playback start times of the samples are in an ascending order. 16.A program for a multiplexer that generates multiplexed data bymultiplexing packets of media data including image data and at least oneof audio data and text data, the program causing a computer to executesteps in a multiplexing method comprising: a media data obtainment stepof obtaining the media data; an analysis step of obtaining playbackstart time information indicating playback start time of a sample thatis a smallest access unit of the image data, audio data and text dataincluded in the media data by analyzing the media data obtained in themedia data obtainment step; a packetization part determination step ofdetermining, based on the playback start time information obtained inthe analysis step, a packetization part of the media data in a way thatplayback start times of respective samples of the image data, audio dataand text data that are included in the media data are made to be thesame or approximately the same; a packet header part generation step ofgenerating a packet header part that holds a header of the media data ona basis of packetization part determined in the packetization partdetermination step; a packet data part generation step of generating thepacket data part that holds entity data of the media data on a basis ofthe packetization part determined in the packetization partdetermination step; and a packetization step of generating a packet byconnecting the packet header part generated in the packet header partgeneration step and a packet data part generated in the packet data partgeneration step.
 17. A demultiplexer that obtains multiplexed data wheremedia data including image data and at least one of audio data and textdata are included is multiplexed on a basis of a predeterminedpacketization part, comprising: a multiplexed data obtainment unitoperable to obtain the multiplexed data; an analysis demultiplex unitoperable to analyze the multiplexed data obtained by the multiplexeddata obtainment unit, demultiplexes a header part of the packet from themultiplexed data and obtains the header part; and a random accesssearching unit operable to search only a header of a sample of the imagedata placed in the leading part of the packet header part demultiplexedby the analysis demultiplexing unit and judges whether intra frameinformation indicating that the sample of the image data included in thepacket is an intra coded sample or not at the time of executing randomaccess that is the processing for changing a starting position ofdemultiplexing of the multiplexed data or starting demultiplexing in themiddle of the multiplexed data.